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The mutual information between stimulus and spike-train response is commonly used to monitor 
neural coding efficiency, but neuronal computation broadly conceived requires more refined and 
targeted information measures of input-output joint processes. A first step towards that larger goal 
is to develop information measures for individual output processes, including information generation 
(entropy rate), stored information (statistical complexity), predictable information (excess entropy), 
and active information accumulation (bound information rate). We calculate these for spike trains 
generated by a variety of noise-driven integrate-and-fire neurons as a function of time resolution and 
for alternating renewal processes. We show that their time-resolution dependence reveals coarse¬ 
grained structural properties of interspike interval statistics; e.g., r-entropy rates that diverge less 
quickly than the firing rate indicate interspike interval correlations. We also find evidence that the 
excess entropy and regularized statistical complexity of different types of integrate-and-fire neurons 
are universal in the continuous-time limit in the sense that they do not depend on mechanism 
details. This suggests a surprising simplicity in the spike trains generated by these model neurons. 

Interestingly, neurons with gamma-distributed ISIs and neurons whose spike trains are alternating 
renewal processes do not fall into the same universality class. These results lead to two conclusions. 

First, the dependence of information measures on time resolution reveals mechanistic details about 
spike train generation. Second, information measures can be used as model selection tools for 
analyzing spike train processes. 
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I. INTRODUCTION 

Despite a half century of concerted effort [T] , neurosci¬ 
entists continue to debate the relevant timescales of neu¬ 
ronal communication as well as the basic coding schemes 
at work in the cortex, even in early sensory processing 
regions of the brain thought to be dominated by feedfor¬ 
ward pathways [2Hin]. For example, the apparent vari¬ 
ability of neural responses to repeated presentations of 
sensory stimuli has led many to conclude that the brain 
must average across tens or hundreds of milliseconds or 
across large populations of neurons to extract a meaning¬ 
ful signal m- Whereas, reports of reliable responses sug¬ 
gest shorter relevant timescales and more nuanced coding 
schemes [MU]- In fact, there is evidence for different 
characteristic timescales for neural coding in different pri¬ 
mary sensory regions of the cortex [TS]. In addition to 


* smarzen@berkeley.edu 
t deweese@berkeley.edu 
t chaos@ucdavis.edu 


questions about the relevant timescales of neural com¬ 
munication, there has been an ongoing debate regarding 
the magnitude and importance of correlations among the 
spiking responses of neural populations [TBIEO] . 

Most studies of neural coding focus on the relation¬ 
ship between a sensory stimulus and the neural response. 
Others consider the relationship between the neural re¬ 
sponse and the animal’s behavioral response m, the re¬ 
lationship between pairs or groups of neurons at different 
stages of processing [53] , or the variability of neural 

responses themselves without regard to other variables 
[50] . Complementing the latter studies, we are interested 
in quantifying the randomness and predictability of neu¬ 
ral responses without reference to stimulus. We consider 
the variability of a given neuron’s activity at one time 
and how this is related to the same neuron’s activity at 
other times in the future and the past. 

Along these lines, information theory [5^155] provides 
an insightful and rich toolset interpreting neural data 
and for formulating theories of communication and com¬ 
putation in the nervous system [26] . In particular, Shan- 
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non’s mutual information has developed into a powerful 
probe that quantifies the amount of information about a 
sensory stimulus encoded by neural activity . 

Similarly, the Shannon entropy has been used to quan¬ 
tify the variability of the resulting spike-train response. 
In contrast to these standard stimulus- and response- 
averaged quantities, a host of other information-theoretic 
measures have been applied in neuroscience, such as the 
Fisher information |25j and various measures of the in¬ 
formation gained per observation [37l |38] . 

We take an approach that complements more familiar 
informational analyses. First, we consider “output-only” 
processes, since their analysis is a theoretical prerequisite 
to understanding information in the stimulus-response 
paradigm. Second, we analyze rates of informational di¬ 
vergence, not only nondivergent components. Indeed, we 
show that divergences, rather than being a kind of math¬ 
ematical failure, are important and revealing features of 
information processing in spike trains. 

We are particularly interested in the information con¬ 
tent of neural spiking on fine timescales. How is infor¬ 
mation encoded in spike timing and, more specifically, in 
interspike intervals? In this regime, the critical questions 
turn on determining the kind of information encoded 
and the required “accuracy” of individual spike timing 
to support it. At present, unfortunately, characterizing 
communication at submillisecond time scales and below 
remains computationally and theoretically challenging. 

Practically, a spike train is converted into a binary 
sequence for analysis by choosing a time bin size and 
counting the number of spikes in successive time bins. 
Notwithstanding Refs. EH SO], there are few studies 
of how estimates of communication properties change 
as a function of time bin size, though there are exam¬ 
ples of both short m and long [351 112] time expan¬ 
sions. Said most plainly, it is difficult to directly calcu¬ 
late the most basic quantities—e.g., communication rates 
between stimulus and spike-train response—in the sub¬ 
millisecond regime, despite progress on undersampling 
[33H15] . Beyond the practical, the challenges are also 
conceptual. For example, given that a stochastic process’ 
entropy rate diverges in a process-characteristic fashion 
for small time discretizations |46j , measures of communi¬ 
cation efficacy require careful interpretation in this limit. 

Compounding the need for better theoretical tools, 
measurement techniques will soon amass enough data to 
allow serious study of neuronal communication at fine 
time resolutions and across large populations [47j . In 
this happy circumstance, we will need guideposts for how 
information measures of neuronal communication vary 
with time resolution so that we can properly interpret 
the empirical findings and refine the design of nanoscale 
probes. 


Many single-neuron models generate neural spike 
trains that are renewal processes |1H]- Starting from this 
observation, we use recent results [15] to determine how 
information measures scale in the small time-resolution 
limit. This is exactly the regime where numerical meth¬ 
ods are most likely to fail due to undersampling and, 
thus, where analytic formulae are most useful. We also 
extend the previous analyses to structurally more com¬ 
plex, alternating renewal processes and analyze the time- 
resolution scaling of their information measures. This 
yields important clues as to which scaling results apply 
more generally. We then show that, across several stan¬ 
dard neuronal models, the information measures are uni¬ 
versal in the sense that their scaling does not depend on 
the details of spike-generation mechanisms. 

Several information measures we consider are already 
common fixtures in theoretical neuroscience, such as 
Shannon’s source entropy rate |39l |40] . Others have ap¬ 
peared at least once, such as the finite-time excess en¬ 
tropy (or predictive information) |50l HI] and statistical 
complexity [52] . And others have not yet been applied, 
such as the bound information [53l [54]. 

The development proceeds as follows. Section jn] re¬ 
views notation and definitions. To investigate the de¬ 
pendence of causal information measures on time resolu¬ 
tion, Sec. studies a class of renewal processes moti¬ 
vated by their wide use in describing neuronal behavior. 
Section IV then explores the time-resolution scaling of 
information measures of alternating renewal processes, 
identifying those scalings likely to hold generally. Sec¬ 
tion |V| evaluates continuous-time limits of these informa¬ 
tion measures for common single-neuron models. This 
reveals a new kind of universality in which the informa¬ 
tion measures’ scaling is independent of detailed spik¬ 
ing mechanisms. Taken altogether, the analyses provide 
intuition and motivation for several of the rarely-used, 
but key informational quantities. For example, the infor¬ 
mational signatures of integrate-and-fire model neurons 
differ from both simpler, gamma-distributed processes 
and more complex, compound renewal processes. Finally, 
Sec. I VI I summarizes the results, giving a view to future 
directions and mathematical and empirical challenges. 


II. BACKGROUND 

We can only briefly review the relevant physics of in¬ 
formation. Much of the phrasing is taken directly from 
background presented in Refs. iniiss]. 

Let us first recall the causal state definitions [S5] and 
information measures of discrete-time, discrete-state pro¬ 
cesses introduced in Refs. |53l |57|. The main ob¬ 
ject of study is a process V: the list of all of a sys- 
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tern’s behaviors or realizations {... X- 2 , X-i, Xo,xi,...} 
and their probabilities, specified by the joint distribu¬ 
tion Pr(... X- 2 , X-i, Xq, Xi,...). We denote a contigu¬ 
ous chain of random variables as Xq-.l = ■ • • X^-i. 

We assume the process is ergodic and stationary— 
Pr(Xo:L) = Pi{Xt,L+t) for all t S Z—and the measure¬ 
ment symbols range over a finite alphabet: x £ A. In this 
setting, the present Xq is the random variable measured 
at t = 0, the past is the chain X-q = ... X- 2 X -1 leading 
up the present, and the future is the chain following the 
present Xx, = XiX 2 ■ ■ • ■ (We suppress the infinite index 
in these.) 

As the Introduction noted, many information-theoretic 
studies of neural spike trains concern input-output in¬ 
formation measures that characterize stimulus-response 
properties; e.g., the mutual information between stimu¬ 
lus and resulting spike train. In the absence of stimulus 
or even with a nontrivial stimulus, we can still study neu¬ 
ral activity from an information-theoretic point of view 
using “output-only” information measures that quantify 
intrinsic properties of neural activity alone: 


• How random is it? The entropy rate = 
H[Xq\X.,q\^ which is the entropy in the present ob¬ 
servation conditioned on all past observations [25] . 

• What must be remembered about the past to opti¬ 
mally predict the future? The causal states , 
which are groupings of pasts that lead to the 
same probability distribution over future trajecto¬ 
ries j^Esj. 

• How much memory is required to store the causal 

states? The statistical complexity or 

the entropy of the causal states [58] . 

• How much of the future is predictable from the 
past? The exeess entropy E = I[X.q-, ATo,], which is 
the mutual information between the past and the 
future [5T] . 

• How much of the generated information (h^) is rel¬ 
evant to predicting the future? The bound infor¬ 
mation 6^ = I[Xq, Xi. \X. f\^ which is the mutual 
information between the present and future obser¬ 
vations conditioned on all past observations [53] . 

• How much of the generated information is useless— 
neither affects future behavior nor contains infor¬ 
mation about the past? The ephemeral informa¬ 
tion r^ = i7[Aro|Xo,Xi:], which is the entropy in 
the present observation conditioned on all past and 
future observations [S3]. 


The information diagram of Fig. [^ illustrates the rela¬ 
tionship between r^, and E. When we change 
the time discretization At, our interpretation and defini¬ 


tions change somewhat, as we describe in Sec. HI 


Shannon’s various information quantities—entropy, 
conditional entropy, mutual information, and the like— 
when applied to time series are functions of the joint dis¬ 
tributions Pr(Xo:L). Importantly, they define an algebra 
of information measures for a given set of random vari¬ 
ables [59] . Ref. [53] used this to show that the past and 
future partition the single-measurement entropy H{Xq) 
into the measure-theoretic atoms of Fig.[T] These include 
those—and b ^—already mentioned and the enigmatic 
information: 


g^ = /[Xo;Xo;Xi,] , 

which is the co-information between past, present, and 
future. One can also consider the amount of predictable 
information not captured by the present: 

a;, =/[Xo;Xi,|Xo]. 

which is the elusive information [60] . It measures the 
amount of past-future correlation not contained in the 
present. It is nonzero if the process has “hidden states” 
and is therefore quite sensitive to how the state space is 
“observed” or coarse-grained. 

The total information in the future predictable from 
the past (or vice versa)—the excess entropy—decomposes 
into particular atoms: 

^ = bfi -\- -\- . 

The process’s Shannon entropy rate is also a sum of 
atoms: 


h^ = r^, + b^ . 

This tells us that a portion of the information (h^) a pro¬ 
cess spontaneously generates is thrown away (r^) and a 
portion is actively stored (6^). Putting these observa¬ 
tions together gives the information anatomy of a single 
measurement Xq: 

^[^ 0 ] = + 26^ -I- r^ . (1) 

Although these measures were originally defined for sta¬ 
tionary processes, they easily carry over to a nonstation¬ 
ary process of finite Markov order. 

Calculating these information measures in closed-form 
given a model requires finding the e-machine^ which is 
constructed from causal states. Forward-time causal 
states <S^ are minimal sufficient statistics for predict¬ 
ing a process’s future [SS] |SS]. This follows from their 
definition—a causal state a'^ € S'*' is a sets of pasts 
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grouped by the equivalence relation 

X:0 

^ Pr(Xo:|Xo = a^:o) = Pr(Xo:|Xo = x',o) • (2) 

So, is a set of classes—a coarse-graining of the un- 
countably infinite set of all pasts. At time t, we have the 
random variable that takes values G and de¬ 
scribes the causal-state process ..., Sti,SQ ,S^ ,.... Sjl' 
is a partition of pasts X-t that, according to the index¬ 
ing convention, does not include the present observa¬ 
tion Xj. In addition to the set of pasts leading to it, 
a causal state cr^ has an associated future morph —the 
conditional measure Pr(Xt.|(T^) of futures that can be 
generated from it. Moreover, each state cr^ inherits a 
probability 7r(cr^) from the process’s measure over pasts 
Pr(X:(). The forward-time statistical complexity is then 
the Shannon entropy of the state distribution 7r((j^) [55] : 
(7+ = A generative model is constructed out of 

the causal states by endowing the causal-state process 
with transitions: 

Til] = Pr(5 +1 = a', X = x|5+ = a) , 

that give the probability of generating the next symbol 
X and ending in the next state cr', if starting in state a. 
(Residing in a state and generating a symbol do not occur 
simultaneously. Since symbols are generated during tran¬ 
sitions there is, in effect, a half time-step difference in the 
indexes of the random variables Xj and S^'. We suppress 
notating this.) To summarize, a process’s forward-time 
e-machine is the tuple {A,S^: x G A}}. 

For a discrete-time, discrete-alphabet process, the 
e-machine is its minimal unifilar hidden Markov model 
(HMM) [55] [55j . (For general background on HMMs see 
[5TVI55] .! Note that the causal state set can be finite, 
countable, or uncountable; the latter two cases can oc¬ 
cur even for processes generated by finite-state HMMs. 
Minimality can be defined by either the smallest number 
of states or the smallest entropy over states [55] . 

Unifilarity is a constraint on the transition matrices 
such that the next state a' is determined by knowing 
the current state cr and the next symbol x. That is, if 
the transition exists, then Pr(5j()_;^|Xt = x.S^' = a) has 
support on a single causal state. 


III. INFINITESIMAL TIME RESOLUTION 

One often treats a continuous-time renewal process, 
such as a spike train from a noisy integrate-and-fire neu¬ 
ron, in a discrete-time setting [26] . With results of Ref. 
[49] in hand, we can investigate how artificial time bin- 


H[Xo] 



FIG. 1. Information diagram illustrating the anatomy of the 
information H[Xo] in a process’ single observation Xq in the 
context of its past X-,o and its future Xi,. Although the 
past entropy H[X:o] and the future entropy H[Xi] typically 
are infinite, space precludes depicting them as such. They 
do scale in a controlled way, however: H[X-e:o] oc hA and 
H[Xi-i\ oc hilt. The two atoms labeled are the same, since 
we consider only stationary processes. (After Ref. |53| . with 
permission.) 

ning affects estimates of a model neuron’s spike train’s 
randomness, predictability, and information storage in 
the limit of infinitesimal time resolution. This is exactly 
the limit in which analytic formulae for information mea¬ 
sures are most useful. For example, as shown shortly in 
Fig.i they reveal that increasing the time resolution 
artificially increases the apparent range of temporal cor¬ 
relations. 

Time-binned neural spike trains of noisy integrate-and- 
fire neurons have been studied for quite some time [1] 
and, despite that history, this is still an active endeavor 
[26l l64] . Our emphasis and approach differ, though. We 
do not estimate statistics or reconstruct models from sim¬ 
ulated spike train data using nonparametric inference 
algorithms—e.g., as done in Ref. [55]. Rather, we ask 
how e-machines extracted from a spike train process and 
information measures calculated from them vary as a 
function of time coarse-graining. Our analytic approach 
highlights an important lesson about such studies in gen¬ 
eral: A process’ e-machine and information anatomy are 
sensitive to time resolution. A secondary and compensat¬ 
ing lesson is that the manner in which the e-machine and 
information anatomy scale with time resolution conveys 
much about the process’ structure. 

Suppose we are given a neural spike train with inter¬ 
spike intervals independently drawn from the same inter¬ 
spike interval (ISI) distribution (j}{t) with mean ISI l//i. 
To convert the continuous-time point process into a se¬ 
quence of binary spike-quiescence symbols, we track the 
number of spikes emitted in successive time bins of size 
At. Our goal, however, is to understand how the choice 
of At affects reported estimates for C^, hfj,, E, 6^, and 
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a^. The way in which each of these vary with At reveals 
information about the intrinsic time scales on which a 
process behaves; cf., the descriptions of entropy rates in 
Refs. [46l [Ml [66] ■ We concern ourselves with the in¬ 
finitesimal At limit, even though the behavior of these 
information atoms is potentially most interesting when 
At is on the order of the process’ intrinsic time scales. 

In the infinitesimal time-resolution limit, when At is 
smaller than any intrinsic timescale, the neural spike 
train is a renewal process with interevent count distri¬ 
bution: 


F(n) « (j){nAt) At (3) 

and survival function: 

poo 

w{n) K, / (j){t)dt . (4) 

J nAt 

The interevent distribution F{n) is the probability distri¬ 
bution that the silence separating successive events (bins 
with spikes) is n counts long. While the survival function 
w{n) is the probability that the silence separating suc¬ 
cessive events is at least n counts long. The e-machine 
transition probabilities therefore change with At. The 
mean interevent count (T) -|- 1 is not the mean interspike 
interval l//r since one must convert between counts and 
spikes [S7] : 


(T) + l 


1 

p,At 


(5) 


In this limit, the e-machines of spike-train renewal pro¬ 
cesses can take one of the topologies described in Ref. 


Here, we focus only on two of these e-machine topolo¬ 
gies. The first topology corresponds to that of an eventu¬ 
ally Poisson process, in which the ISI distribution takes 
the form (j){t) = (j)(T)e~^^^~'^'> for some finite T and 
A > 0. A Poisson neuron with firing rate A and refrac¬ 
tory period of time T, for instance, eventually (t > T) 
generates an Poisson process. Hence, we refer to them 
as eventually Poisson processes. A Poisson process is a 
special type of eventually Poisson process with T = 0; 
see Fig. [^a). However, the generic renewal process has 
e-machine topology shown in Fig. [^c). Technically, only 
noneventually-A Poisson processes have this e-machine 
topology, but for our purposes, this is the e-machine 
topology for any renewal process not generated by a Pois¬ 
son neuron. 

At present, inference algorithms can only infer finite 
e-machines. So, such algorithms applied to renewal pro¬ 
cesses will yield an eventually Poisson topology. (Com¬ 
pare Fig. j^c) to the inferred approximate e-machine of 


an integrate-and-fire neuron in Fig. 2 in Ref. [M].) The 
generic renewal process has an infinite e-machine, though, 
for which the inferred e-machines are only approxima¬ 
tions. 


(a) F ( 0)|1 (b) C ( 0)|1 




FIG. 2. e-Machines of processes generated by Poisson neu¬ 
rons and by integrate-and-fire neurons (left to right): (a) The 
e-machine for a Poisson process, (b) The e-machine for an 
eventually Poisson process; i.e., a Poisson neuron with a re¬ 
fractory period of length fiAt. (c) The e-machine for a generic 
renewal process—the not eventually A-Poisson process of Ref. 
[49j : i.e., the process generated by noise-driven integrate-and- 
fire neurons. Edge labels p\x denote emitting symbol x (“1” is 
“spike”) with probability p. (Reprinted with permission from 

Ref. US].) 


We calculated E and using the expressions given in 
Ref. [49]. Substituting in Eqs. (|^, Q, and ([^, we find 
that the excess entropy E tends to: 

pt(j){t) log2 {p.4>{t))dt 

poo 

-2 ^$(<)log 2 (/r4>(t))dt , (6) 

Jo 

where $(t) = (l>(t')dt' is the probability that an ISI 

is longer than t. It is easy to see that E(At) limits to a 
positive and (usually) finite value as the time resolution 
vanishes, with some exceptions described below. Simi¬ 
larly, using the expression in Ref. |49|’s App. H, one can 
show that the finite-time excess entropy E(T) [M] takes 


lim E(At) = [ 

At—>0 Jq 
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the form: 


lim E(r) = 

At->-0 


/ log2 - 

/o 


-2 [ ^$(t)log2$(<)dt 
- fi J $(t)dtlog 2 ^/ry 


litF{t) log2 F{t)dt 


/ OO 

//F(t)log2F(t)dt . (7) 


As T — )■ 00, E(T) —>■ E. Note that these formulae apply 
only when mean firing rate /i is nonzero. 

Even if E limits to a finite value, the statistical com¬ 
plexity typically diverges due to its dependence on time 
discretization At. Suppose that we observe an eventu¬ 
ally Poisson process, such that for 

t > T. Then, from formulae in Ref. [49], statistical com¬ 
plexity in the infinitesimal time resolution limit becomes: 


C^{At) 



log2 


1 

At 



(/X$(t))log 2 {^i<^>{t))dt 



<^{t)dt I log2 




( 8 ) 


ignoring terms of 0{At) or higher. The first term di¬ 
verges, and its rate of divergence is the probability of 
observing a time since last spike less than T. This mea¬ 
sures the spike train’s deviation from being A-Poisson 
and so reveals the effective dimension of the underlying 
causal state space. C^’s remaining nondivergent compo¬ 
nent is equally interesting. In fact, it is the differential 
entropy of the time since last spike distribution. 

An immediate consequence of the analysis is that this 
generic infinitesimal renewal process is highly cryptic 
m- It hides an arbitrarily large amount of its inter¬ 
nal state information: diverges as At —>■ 0 but E 

(usually) asymptotes to a finite value. We have very 
structured processes that have disproportionately little 
in the future to predict. Periodic processes constitute 
an important exception to this general rule of thumb for 
continuous-time processes. A neuron that fires every T 
seconds without jitter has E = C^, and both E and 
diverge logarithmically with 1/At. 

It is straightforward to show that any information mea¬ 
sure contained within the present— F[[Xo], 6^, r^, 

and (recall Fig. —all vanish as At tends to 0. There¬ 


fore, limAt-j-o O')! = hmAt-).o E and the entropy rate be¬ 
comes: 

hf, - p (^og2{At) + J (l){t) log2 (l){t)dt^ At . (9) 

With At -p- 0, hfj^ nominally tends to 0: As we shorten 
the observation time scale, spike events become increas¬ 
ingly rare. There are at least two known ways to ad¬ 
dress hfj, apparently not being very revealing when so 
defined. On the one hand, rather than focusing on the 
uncertainty per symbol, as does, we opt to look at 
the uncertainty per unit time: h^/At. This is the so- 
called At-entropy rate [15] and it diverges as —/x log At. 
Such divergences are to be expected: The large literature 
on dimension theory characterizes a continuous set’s ran¬ 
domness by its divergence scaling rates [Ml ED]. Here, we 
are characterizing sets of similar cardinality—infinite se¬ 
quences. On the other hand, paralleling sequence block- 
entropy definition of entropy rate (h^ =e^oo 
|51j . continuous-time entropy rates are often approached 
within a continuous-time framework using: 

h^= \\m^H{T)lT , 

where H{T) is path entropy, the continuous-time ana¬ 
logue of the block entropy H{i) |^. In these analyses, 
any log At terms are regularized away using Shannon’s 
differential entropy |25] . leaving the nondivergent com¬ 
ponent —p (j){t) log (j){t)dt. Using the At-entropy rate 
but keeping both the divergent and nondivergent com¬ 
ponents, as in Eqs. and 1^, is an approach that 
respects both viewpoints and gives a detailed picture of 
time-resolution scaling. 

A major challenge in analyzing spike trains concerns 
locating the timescales on which information relevant to 
the stimulus is carried. Or, more precisely, we are of¬ 
ten interested in estimating what percentage of the raw 
entropy of a neural spike train is used to communicate in¬ 
formation about a stimulus; cf. the framing in Ref. [39] . 
For such analyses, the entropy rate is often taken to be 
H{At, T)/T, where T is the total path time and H{At, T) 
is the entropy of neural spike trains over time T resolved 
at time bin size At. In terms of previously derived quan¬ 
tities and paralleling the well known block-entropy linear 
asymptote H{(.) = E -|- |5T], this is: 

H{At,T) _ h^{At) E(T,At) 

T At T ' 

From the scaling analyses above, the extensive compo¬ 
nent of H{At,T)/T diverges logarithmically in the small 
At limit due to the logarithmic divergence (Eq. (§) in 
hfj^{At)/At. If we are interested in accurately estimat- 
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ing the entropy rate, then the above is one finite-time 
T estimate of it. However, there are other estimators, 
including: 

H{At,T) - H{At,T - At) ^ ^ dE{T,At) 

At At df ■ 

This estimator converges more quickly to the true en¬ 
tropy rate h^{At)/At than does H{At,T)/T. 

No such log At divergences occur with 6^. Straightfor¬ 
ward calculation, not shown here, reveals that: 

T / 7 °° 

aI^o J ^ t')dt'dt 

Jo • ( 10 ) 

Since limAt-j-o < oo and hfj.{At)/At 

diverges, the ephemeral information rate r^(At)/At also 
diverges as At —>■ 0. The bulk of the information gener¬ 
ated by such renewal processes is dissipated and, having 
no impact on future behavior, is not useful for prediction. 

Were we allowed to observe relatively microscopic 
membrane voltage fluctuations rather than being re¬ 
stricted to the relatively macroscopic spike sequence, the 
At-scaling analysis would be entirely different. Follow¬ 
ing Ref. [55j or natural extensions thereof, the statisti¬ 
cal complexity diverges as — log e, where e is the resolu¬ 
tion level for the membrane voltage, the excess entropy 
diverges as log 1/At, the time-normalized entropy rate 
diverges as log \/27reZ?At/At, and the time-normalized 
bound information diverges as 1/2At. In other words, 
observing membrane voltage rather than spikes makes 
the process far more predictable. The relatively more 
macroscopic modeling at the level of spikes throws away 
much detail of the underlying biochemical dynamics. 

To illustrate the previous points, we turn to numerics 
and a particular neural model. Consider an (unleaky) 
integrate-and-fire neuron driven by white noise whose 
membrane voltage (after suitable change of parameters) 
evolves according to: 

= 6-1-/D?7(t) , (11) 

where r]{t) is white noise such that {r]{t)) = 0 and 
( 77 ( 7 ) 77 ( 7 ')) = (5(7 — 7'). When V = 1, the neuron spikes 
and the voltage is reset to F = 0; it stays at F = 0 for 
a time r, which enforces a hard refractory period. Since 
the membrane voltage resets to a predetermined value, 
the interspike intervals produced by this model are inde¬ 
pendently drawn from the same interspike interval dis- 



FIG. 3. An unleaky integrate-and-fire neuron driven by white 
noise has varying interevent count distributions F{n) that de¬ 
pend on time bin size A7. Based on the ISI distribution 
given in Eq. ( |12[ ) with r = 2 milliseconds, 1/p = 1 millisec¬ 
ond, and A = 1 millisecond. Data points represent exact val¬ 
ues of F{n) calculated for integer values of N. Dashed lines 
are interpolations based on straight line segments connecting 
nearest neighbor points. 


tribution: 

Here, 1/p = 1/& is the mean interspike interval and 
A = l/H is a shape parameter that controls ISI variance. 
This neural model is not as realistic as that of a linear 
leaky integrate-and-fire neural model [IS], but is com¬ 
plex enough to illustrate the points made earlier about 
the scaling of information measures and time resolution. 

For illustration purposes, we assume that the time- 
binned neural spike train is well approximated by a re¬ 
newal process, even when A7 is as large as one millisec¬ 
ond. This assumption will generally not hold, as past in¬ 
terevent counts could provide more detailed historical in¬ 
formation that more precisely places the last spike within 
its time bin. Even so, the reported information measure 
estimates are still useful. The estimated is an upper 
bound on the true entropy rate; the reported E is a lower 
bound on the true excess entropy using the Data Process¬ 
ing Inequality |5S] ; and the reported will usually be a 
lower bound on the true process’ statistical complexity. 

Employing the renewal process assumption, numeri¬ 
cal analysis corroborates the infinitesimal analysis above. 
Figure]^ plots F{n )—the proxy for the full, continuous¬ 
time, ISI distribution—for a given set of neuronal pa¬ 
rameter values as a function of time resolution. Figure 

then shows that and exhibit logarithmic scal¬ 
ing at millisecond time discretizations, but that E does 
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FIG. 4. How spike-train information measures (or rates) depend on time discretization At for an unleaky integrate-and-fire 
neuron driven by white noise. Top left: Statistical complexity as a function of both the ISI distribution shape parameters 
and the time bin size At. The horizontal axis is At in milliseconds on a log-scale and the vertical axis is in bits on a linear 
scale for three different ISI distributions following Eq. (121 with r = 2 milliseconds. Top right: Entropy rate also as a 
function of both shape parameters and At. Axes labeled as in the previous panel and the same three ISI distributions are used. 
Bottom left: Excess entropy E as a function of both the shape parameters and At. For the blue line limAt-s-o E(At) = 0.75 
bits; purple line, limAt->o E(At) = 0.86 bits; and yellow line, limAt-s-o E(At) = 0.41 bits. All computed from Eq. 0. Bottom 
right: Bound information rate 6^(At)/At parametrized as in the previous panels. For the blue line limAt->o h^{AtYlAt = 0.73 
bits per second; purple line, limAt->o ~ 1-04 bits per second; and yellow line, limAt^o 5. 

second. All computed from Eq. |10[ 


fj,{At)/At = 0.30 bits per 


not converge to its continuous-time value until we reach 
time discretizations on the order of hundreds of microsec¬ 
onds. Even when At = 100 fj,s, h^{At)/At still has not 
converged to its continuous-time values. 

The statistical complexity increases without bound, 


as At —>■ 0; see the top left panel of Fig. As sug¬ 
gested in the infinitesimal renewal analysis, vanishes, 
whereas h^/At diverges at a rate of n log 2 1 /At, as shown 
in the top right plots of Fig. As anticipated, E tends to 
a finite, ISI distribution-dependent value when At tends 
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to 0, as shown in the bottom left panel in Fig. Finally, 
the lower right panel plots &^(At)/At. 

One conclusion from this simple numerical analysis is 
that one should consider going submillisecond time res¬ 
olutions to obtain accurate estimates of limAt->.o E(At) 
and limAt-j-o &/i(At)/At, even though the calculated in¬ 
formational values are a few bits or even less than one 
bit per second in magnitude. 


IV. ALTERNATING RENEWAL PROCESSES 

The form of the At-scalings discussed in Sec. |III| occur 
much more generally than indicated there. Often, our 
aim is to calculate the nondivergent component of these 
information measures as At —0, but the rates of these 
scalings are process-dependent. Therefore, these diver¬ 
gences can be viewed as a feature rather than a bug; 
they contain additional information about the process’ 
structure [25] ■ 

To illustrate this point, we now investigate At-scalings 
for information measures of alternating renewal processes 
(ARPs), which are structurally more complex than the 
standard renewal processes considered above. For in¬ 
stance, these calculations suggest that rates of divergence 
of the T-entropy rate smaller than the firing rate, such as 
those seen in Ref. [40|, are indicative of strong ISI cor¬ 
relations. Calculational details are sequestered in App. 

El 

In an ARP, an ISI is drawn from one distribution 
then another distribution (jP'\t), then the first 
again, and so on. We refer to the new piece of ad¬ 
ditional information—the ISI distribution currently be¬ 
ing drawn from—as the modality. Under weak technical 
conditions, the causal states are the modality and time 
since last spike. The corresponding, generic e-machine is 
shown in Fig. We define the modality-dependent sur¬ 
vival functions as $i(t) = , the modality- 

dependent mean firing rates as: 

/.oo 

= (13) 

the modality-dependent differential entropy rates: 

pOO 

Jo 

the modality-dependent continuous-time statistical com¬ 
plexity: 


and the modality-dependent excess entropy: 

= J \og2 dt 

-2^ ^W$(*)(f)log2 dt . (14) 

It is straightforward to show, as done in App. 
that the time-normalized entropy rate still scales with 
logs 1/^1: 

h^At) 

At ^ /i(i) + ^(2) At y ^ ^(1) + ^(2) ■ 

(15) 


As expected, the statistical complexity still diverges: 


C^(At) - 2 logs ( + 

Ml 

Ml + M2 




/l(l) +^(2) 


Hb 


(16) 


where Hb{p) = —plogsP — (1 — p) log 2 (l — p) is the en¬ 
tropy in bits of a Bernoulli random variable with bias 
p. Finally, the excess entropy still limits to a positive 
constant: 


lim E(At) 

At->-0 


Hb 


Ml \ 
Ml + M2 / 


/l(l) +p(2) • 

(17) 


The additional terms Hb{-) come from the information 
stored in the time course of modalities. 

As a point of comparison, we ask what these informa¬ 
tion measures would be for the original (noncomposite) 
renewal process with the same ISI distribution as the 
ARP. As described in App. the former entropy rate 
is always less than the true its statistical complexity 
is always less than the true C^; and its excess entropy 
is always smaller than the true E. In particular, the 
ARP’s divergence rate is always less than or equal to 
the mean firing rate p. Interestingly, this coincides with 
what was found empirically in the time series of a single 
neuron; see Fig. 5C in Ref. gsj. 

The ARPs here are a first example of how one can 
calculate information measures of the much broader and 
more structurally complex class of processes generated by 
unifilar hidden semi-Markov models, a subclass of hidden 
semi-Markov models [72] . 


=- log2 (/r(*)d>(*)(t)) dt , 
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mi(m+l) Iq 
wi(m) K 


W2(m+1) Iq 
•W2{m) K 


FIG. 5. e-Machine for an alternating renewal process in which neither interevent count distribution is A-Poisson and they are 
not equal almost everywhere. State label Um. denotes n counts since the last event and present modality m. 


V. INFORMATION UNIVERSALITY 

Another aim of ours was is to interpret the information 
measures. In particular, we wished to relate infinitesi¬ 
mal time-resolution excess entropies, statistical complex¬ 
ities, entropy rates, and bound information rates to more 
familiar characterizations of neural spike trains—firing 
rates /x and ISI coefficient of variations Cy- To address 
this, we now analyze a suite of familiar single-neuron 
models. We introduce the models first, describe the pa¬ 
rameters behind our numerical estimates, and then com¬ 
pare the information measures. 

Many single-neuron models, when driven by tempo¬ 
rally uncorrelated and stationary input, produce neural 
spike trains that are renewal processes. We just ana¬ 
lyzed one model class, the noisy integrate-and-fire (NIF) 
neurons in Sec. |III[ focusing on time-resolution depen¬ 
dence. Other common neural models include the linear 
leaky integrate-and-fire (LIF) neuron, whose dimension¬ 
less membrane voltage, after a suitable change of param¬ 
eters, fluctuates as: 

^ = b-V + a7j{t) , (18) 

and when U = 1, a spike is emitted and V is instan¬ 
taneously reset to 0. We computed ISI survival func¬ 
tions from empirical histograms of 10^ ISIs; we varied 
b S [1.5,5.75] in steps of 0.25 and a G [0.1,3.0] in steps 
of 0.1 to a = 1.0 and in steps of 0.25 thereafter. 

The quadratic integrate-and-fire (QIF) neuron has 
membrane voltage fluctuations that, after a suitable 
change of variables, are described by: 

= b + V'^ + , (19) 

and when V = 100, a spike is emitted and V is in¬ 
stantaneously reset to —100. We computed ISI survival 


functions from empirical histograms of trajectories with 
10® ISIs; we varied b G [0.25,4.75] in steps of 0.25 and 
a G [0.25,2.75] in steps of 0.25. The QIF neuron has a 
very different dynamical behavior from the LIF neuron, 
exhibiting a Hopf bifurcation at 6 = 0. Simulation details 
are given in App. 

Finally, ISI distributions are often ht to gamma distri¬ 
butions, and so we also calculated the information mea¬ 
sures of spike trains with gamma-distributed ISIs (GISI). 

Each neural model—NIF, LIF, QIF, and GISI—has 
its own set of parameters that governs its ISI distribu¬ 
tion shape. Taken at face value, this would make it dif¬ 
ficult to compare information measures across models. 
Fortunately, for each of these neural models, the firing 
rate /x and coefficient of variation Cy uniquely deter¬ 
mine the underlying model parameters m- As App. 
[b| shows, the quantities limAt->.o E(At), limAt-j-o + 
log 2 (/xAt), + log 2 (iiAt), and 

\iiLnAt^ob^{At)/fiAt depend only on the ISI coefficient 
of variation Cy and not the mean firing rate fi. 

We estimated information measures from the simu¬ 
lated spike train data using plug-in estimators based on 
the formulae in Sec. [ml Enough data was generated 
that even naive plug-in estimators were adequate except 
for estimating 6^ when Cy was larger than 1. See App. 
[B] for estimation details. That said, binned estimators 
are likely inferior to binless entropy estimators m, and 
naive estimators tend to have large biases. This will be 
an interesting direction for future research, since a de¬ 
tailed analysis goes beyond the present scope. 

Figure [^compares the statistical complexity, excess en¬ 
tropy, entropy rate, and bound information rate for all 
four neuron types as a function of their Cy. Surprisingly, 
the NIF, LIF, and QIF neuron’s information measures 
have essentially identical dependence on Cy. That is, 
the differences in mechanism do not strongly affect these 
informational properties of the spike trains they gener¬ 
ate. Naturally, this leads one to ask if the informational 
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FIG. 6. Information universality across distinct neuron dynamics. We find that several information measures depend only on 
the ISI coefficient of variation Cv and not the ISI mean firing rate ^ for the following neural spike train models: (i) neurons with 
Gamma distributed ISIs (GISI, blue), (ii) noisy integrate-and-fire neurons governed by Eq. 0 (NIF, green), (iii) noisy linear 
leaky integrate-and-fire neurons governed by Eq. ( |18[ ) (LIF, dotted red), and (iv) noisy quadratic integrate-and-fire neurons 
governed by Eq. (19 1 (QIF, dotted blue). Top left: limAt 
Bottom left: limAt-t-o E(^I)- Bottom right: limAt-»o 
excluded due to the difficulty of accurately estimating log 2 </>(! -f t')dtdt' from simulated spike trains. See text 

for discussion. 


^•0 C'^(At) -I- log 2 (MAt). Top right: limAt-»o 

fj.At). In the latter, ISI distributions with smaller Cv 


0 (At )/fiAt -I- log2 (/r At). 


indifference to mechanism generalizes to other spike train 
model classes and stimulus-response settings. 

Figure |^s top left panel shows that the continuous¬ 
time statistical complexity grows monotonically with in¬ 
creasing Cv- In particular, the statistical complexity in¬ 
creases logarithmically with ISI mean and approximately 
linearly with the ISI coefficient of variation Cy- That 
is, the number of bits that must be stored to predict 


these processes increases in response to additional process 
stochasticity and longer temporal correlations. In fact, it 
is straightforward to show that the statistical complex¬ 
ity is minimized and excess entropy maximized at fixed 
/i when the neural spike train is periodic. This is unsur¬ 
prising since, in the space of processes, periodic processes 
are least cryptic (C^ — E = 0) and so knowledge of oscil¬ 
lation phase is enough to completely predict the future. 
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(See App. I) 

The bottom left panel in Figurej^shows that increasing 
Cv tends to decrease the excess entropy E —the number 
of bits that one can predict about the future. E diverges 
for small Cy, dips at the Cy where the ISI distribution 
is closest to exponential, and limits to a small number 
of bits at large Cy- At small Cy, the neural spike train 
is close to noise-free periodic behavior. When analyzed 
at small but nonzero At, E encounters an “ultraviolet 
divergence” m- Thus, E diverges as Cy —>■ 0, and a 
simple argument in App. [^suggests that the rate of di¬ 
vergence is log 2 (l/Cy). At an intermediate Cy ~ 1, the 
ISI distribution is as close as possible to that of a mem¬ 
oryless Poisson process and so E is close to vanishing. 
At larger Cy, the neural spike train is noise-driven. Sur¬ 
prisingly, completely noise-driven processes still have a 
fraction of a bit of predictability: knowing the time since 
last spike allows for some power in predicting the time to 
next spike. 

The top right panel shows that an appropriately 
rescaled differential entropy rate varies differently for 
neural spike trains from noisy integrate-and-fire neurons 
and neural spike trains with gamma-distributed ISIs. As 
expected, the entropy rate is maximized at Cy near 
I, consistent with the Poisson process being the maxi¬ 
mum entropy distribution for fixed mean ISI. Gamma- 
distributed ISIs are far less random than ISIs from noisy 
integrate-and-fire neurons, holding /r and Cy constant. 

Finally, the continuous-time bound information (&^) 
rate varies in a similar way to E with Cy. (Note that 
since the plotted quantity is limAt-).o one 

could interpret the normalization by I/^ as a state¬ 
ment about how the mean firing rate /r sets the natural 
timescale.) At low Cy, the 6^ rate diverges as ^ICy, as 
described in App. Interestingly, this limit is singular, 
similar to the results in Ref. m- at Cy = 0, the spike 
train is noise-free periodic and so the 6^ rate is 0. For 
Cy « I, it dips for the same reason that E decreases. 
For larger Cy, 5^’s behavior depends rather strongly on 
the ISI distribution shape. The longer-ranged gamma- 
distribution results in ever-increasing rate for larger 
Cy, while the rate of neural spike trains produced by 
NIF neurons tends to a small positive constant at large 
Cy. The variation of 6^ deviates from that of E quali¬ 
tatively at larger Cy in that the GISI spike trains yield 
smaller total predictability E than that of NIF neurons, 
but arbitrarily higher predictability rate. 

These calculations suggest a new kind of universality 
for neuronal information measures within a particular 
generative model class. All of these distinct integrate- 
and-fire neuron models generate ISI distributions from 
different families, yet their informational properties ex¬ 
hibit the same dependencies on At, /i, and Cy in the limit 


of small At. Neural spike trains with gamma-distributed 
ISIs did not show similar informational properties. And, 
we would not expect neural spike trains that are alter¬ 
nating renewal processes to show similar informational 
properties either. (See Sec. IV ) These coarse informa¬ 
tion quantities might therefore be effective model selec¬ 
tion tools for real neural spike train data, though more 
groundwork must be explored to ascertain their utility. 


VI. CONCLUSIONS 

We explored the scaling properties of a variety of 
information-theoretic quantities associated with two 
classes of spiking neural models: renewal processes and 
alternating renewal processes. We found that informa¬ 
tion generation (entropy rate) and stored information 
(statistical complexity) both diverge logarithmically with 
decreasing time resolution for both types of spiking mod¬ 
els, whereas the predictable information (excess entropy) 
and active information accumulation (bound information 
rate) limit to a constant. Our results suggest that the ex¬ 
cess entropy and regularized statistical complexity of dif¬ 
ferent types of integrate-and-fire neurons are universal in 
the sense that they do not depend on mechanism details, 
indicating a surprising simplicity in complex neural spike 
trains. Our findings highlight the importance of analyz¬ 
ing the scaling behavior of information quantities, rather 
than assessing these only at a fixed temporal resolution. 

By restricting ourselves to relatively simple spiking 
models we have been able to establish several key proper¬ 
ties of their behavior. There are, of course, other impor¬ 
tant spiking models that cannot be expressed as renewal 
processes or alternating renewal processes, but we are en¬ 
couraged by the robust scaling behavior of the entropy 
rate, statistical complexity, excess entropy, and bound 
information rate over the range of models we considered. 

There was a certain emphasis here on the entropy rate 
and hidden Markov models of neural spike trains, both 
familiar tools in computational neuroscience. On this 
score, our contributions are straightforward. We deter¬ 
mined how the entropy rate varies with the time dis¬ 
cretization and identified the possibly infinite-state, unifi- 
lar HMMs required for optimal prediction of spike-train 
renewal processes. Entropy rate diverges logarithmically 
for stochastic processes [15], and this has been observed 
empirically for neural spike trains for time discretizations 
in the submillisecond regime HO]. We argued that the 
divergence rate is an important characteristic. For re¬ 
newal processes, it is the mean firing rate; for alternating 
renewal processes, the “reduced mass” of the mean firing 
rates. Our analysis of the latter, more structured pro¬ 
cesses showed that a divergence rate less than the mean 
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firing rate—also seen experimentally |40) —indicates that 
there are strong correlations between ISIs. Generally, 
the nondivergent component of the time discretization- 
normalized entropy rate is the differential entropy rate; 
e.g., as given in Ref. m- 

Empirically studying information measures as a func¬ 
tion of time resolution can lead to a refined understand¬ 
ing of the time scales over which neuronal communication 
occurs. Regardless of the information measure chosen, 
the results and analysis here suggest that much can be 
learned by studying scaling behavior rather than focusing 
only on neural information as a single quantity estimated 
at a fixed temporal resolution. While we focused on the 
regime in which the time discretization was smaller than 
any intrinsic timescale of the process, future and more 
revealing analyses would study scaling behavior at even 
smaller time resolutions to directly determine intrinsic 
time scales m 

Going beyond information generation (entropy rate), 
we analyzed information measures—namely, statistical 
complexity and excess entropy—that have only recently 
been used to understand neural coding and communica¬ 
tion. Their introduction is motivated by the hypothesis 
that neurons benefit from learning to predict their in¬ 
puts m, which can consist of the neural spike trains of 
upstream neurons. The statistical complexity is the min¬ 
imal amount of historical information required for exact 
prediction. To our knowledge, the statistical complex¬ 
ity has appeared only once previously in computational 
neuroscience [S5]. The excess entropy, a closely related 
companion, is the maximum amount of information that 
can be predicted about the future. When it diverges, 
then its divergence rate is quite revealing of the underly¬ 
ing process EOllZZ], but none of the model neural spike 
trains studied here had divergent excess entropy. Finally, 
the bound information rate has yet to be deployed in the 
context of neural coding, though related quantities have 
drawn attention elsewhere, such as in nonlinear dynamics 
[54j and information-based reinforcement learning m- 
Though its potential uses have yet to be exploited, it 
is an interesting quantity in that it captures the rate 
at which spontaneously generated information is actively 
stored by neurons. That is, it quantifies how neurons 
harness randomness. 

Our contributions to this endeavor are more substan¬ 
tial than the preceding points. We provided exact for¬ 
mulae for the above quantities for renewal processes and 
alternating renewal processes. The new expressions can 
be developed further as lower bounds and empirical es¬ 
timators for a process’ statistical complexity, excess en¬ 
tropy, and bound information rate. This parallels how 
the renewal-process entropy-rate formula is a surprisingly 
accurate entropy-rate estimator [80]. By deriving ex¬ 


plicit expressions, we were able to analyze time-resolution 
scaling, showing that the statistical complexity diverges 
logarithmically for all but Poisson processes. So, just 
like the entropy rate, any calculations of the statistical 
complexity—e.g., as in Ref. —should be accompa¬ 

nied by the time discretization dependence. Notably, the 
excess entropy and the bound information rate have no 
such divergences. 

To appreciate more directly what neural information 
processing behavior these information measures capture 
in the continuous-time limit, we studied them as func¬ 
tions of the ISI coefficient of variation. With an appropri¬ 
ate renormalization, simulations revealed surprising sim¬ 
plicity: a universal dependence on the coefficient of vari¬ 
ation across several familiar neural models. The simplic¬ 
ity is worth investigating further since the dynamics and 
biophysical mechanisms implicit in the alternative noisy 
integrate-and-fire neural models are quite different. If 
other generative models of neural spike trains also show 
similar information universality, then these information 
measures might prove useful as model selection tools. 

Finally, we close with a discussion of a practical is¬ 
sue related to the scaling analyses—one that is especially 
important given the increasingly sophisticated neuronal 
measurement technologies coming online at a rapid pace 
[47] . How small should At be to obtain correct estimates 
of neuronal communication? First, as we emphasized, 
there is no single “correct” estimate for an information 
quantity, rather its resolution scaling is key. Second, 
results presented here and in a previous study by oth¬ 
ers m suggest that extracting information scaling rates 
and nondivergent components can require submillisecond 
time resolution. Third, and to highlight, the regime of 
infinitesimal time resolution is exactly the limit in which 
computational efforts without analytic foundation will 
fail or, at a minimum, be rather inefficient. As such, we 
hope that the results and methods developed here will 
be useful to these future endeavors and guide how new 
technologies facilitate scaling analysis. 
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Appendix A: Alternating renewal process 
information measures 

A discrete-time alternating renewal process draws 
counts from Fi{n), then F 2 {n), then Fi{n), and so on. 
We now show that the modality and counts since last 
event are causal states when Fi ^ F 2 almost everywhere 
and when neither Fi nor F 2 is eventually A-Poisson. We 
present only a proof sketch. 

Two pasts X:o and x'.q belong to the same causal state 
when Pr(Ao:|A:o = x-q) = Pr(Ao:|Xo = x'.^). We can 
describe the future uniquely by a sequence of interevent 
counts Ni, i > 1, and the counts till next event A/’g. 
Likewise, we could describe the past as a sequence of 
interevent counts Aft, i < 0, and the counts since last 
event A/g — A/’g. Let Mi be the modality at time step i. 
So, for instance, Mo is the present modality. 

First, we claim that one can infer the present modality 
from a semi-infinite past almost surely. The probability 
that the present modality is 1 having observed the last 
2M events is: 

Pr(Adg = l\Af-2M-.-l = 

2M 

= F2{ni)Fi{ni-i) . 

i— — l.odd 


|25) . And, we also have: 

L ^ lo ^ 2 (n 0 ^ j-D[Fi\\F2] Mo = l 

^ i=-Mven ^ \D[F 2 \\Fi] Mo = 2 ' 

This implies that: 

D[F2\\Fi]-D[F^\\F2] \l Mo = l 
M™oo 2 [-1 Ado = 2 ■ 

We only fail to identify the present modality almost 
surely from the semi-infinite past if limM-^oo Q = 0. Oth¬ 
erwise, the unnormalized difference of the log likelihoods: 

Pr(Ado = l|A/^-i = n,-i) 

Pr(Ado = 2|A^_i = n--i) 

tends to ± 00 , implying that one of the two probabilities 
has vanished. From the expression, limM->.oo Q = 0 only 
happens when £>[^ 211 X 1 ] = D[Fi||£ 2 ]- However, equality 
requires that Fi(n) = F 2 {n) almost everywhere. 

Given the present modality, we also need to know the 
counts since the last event in order to predict the future 
as well as possible. The proof of this is very similar to 
those given in Ref. [35]. The conditional probability 
distribution of future given past is: 


Similarly, the probability that the present modality is 2 
having observed the last 2M events is: 


Pr(Xg:|Xg = a::g) = Pr(A/'i: |AAg, Xg = a;,g) 

Pr(A/'g|Xg = x.,o) ■ 


Pr(Adg = 2\M-2M-.-i = n- 2 M:-i) 

2M 

= Fi{ni)F2{ni-i) . 

i——l,odd 

We are better served by thinking about the normalized 
difference of the corresponding log likelihoods: 

^_1 . P(Mo = 1\N-2M:-1 = n-2M-.-l) 

2 M ^ F(Mo = 2 \J\f- 2 M:-l = n-2M-.-l) 


Since the present modality is identifiable from the past 
x-o, and since interevent counts are independent given 
modality: 

Pr(A/'l:|AAg,Xg = a;:g) = Pr(A/'i:|Mg = TOg(n:_i)) . 

So, it is necessary to know the modality in order to pre¬ 
dict the future as well as possible. By virtue of how the 
alternating renewal process is generated, the second term 
is: 


Some manipulation leads to: 


Q 


1 1 2 M 

0 \ M ^ 


log 


l.odd 


F2{ni) 

Fi{ni) 


1 

M 


2M 


Y. log 


Fi{ni)\ 

X2(n,)J’ 


and, almost surely in the limit of M —> 00 : 

^ ^ ^ \D[F2\\Fi\ Mo = l 

^ i=^odd ^ \-f4[Fi||F2] X(o = 2 ’ 

(Al) 

where £)[P||(5] is the information gain between P and Q 


Pr(A/g|Xg = x.,o) = Pr(AAo|A/'o = ng. Mg = mo{n.,-i)) . 

A very similar term was analyzed in Ref. |49j . and that 
analysis revealed that it was necessary to store the counts 
since last spike when neither Fi nor F 2 is eventually A- 
Poisson. 

Identifying causal states 5+ as the present modality 
Mo and the counts since last event A/’g immediately al¬ 
lows us to calculate the statistical complexity and en- 
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tropy rate. The entropy rate can be calculated via: 


random variable, such that: 


hfi — i?[Xo|7Wo, A/’o] 

= 7t{Mo = l)H[Xo\Mo = 1,K] 

+ Tr{Mo = 2)HlXo\Mo = ■ 

The statistical complexity is: 

= H[S+] 

= H[Mo,K] 

= H[Mo] + 7 t{Mo = l)H[J^'\Mo = 1] 

+ 7r{Mo = 2)HW^\Mo = 2] . 


(A2) 


Finally, it is straightforward to show that the modality 
Adi at time step 1 and the counts to next event are the 
reverse-time causal states under the same conditions on 
Fi and F 2 . Therefore: 

E^I[S+-S-] 

= I[Mo,K-,Mi,Afo-K] 

= I[Mo-,MuMo-K] 

+ /[A/'^;Ali,A/'o-A/'^|Mo] ■ 

One can continue in this way to find formulae for other 
information measures of a discrete-time alternating re¬ 
newal process. 

These formulae can be rewritten terms of the modality- 


dependent information measures of Eqs. (13)-(14) if we 


recognize two things. First, the probability of a partic¬ 
ular modality is proportional to the average amount of 
time spent in that modality. Second, for reasons similar 
to those outlined in Ref. [IH], the probability of counts 
since last event given a particular present modality i is 
proportional to Wi{n). Hence, in the infinitesimal time 
discretization limit, the probability of modality 1 is: 


7r(Ado = 1) = 




( 1 ) 


(1) -H u(2) 


and similarly for modality 2. Then, the entropy rate out 
of modality i is: 


H[Xi\Mo = i,K] 


At ( log2 


1 

At 


h«(At) 


and the modality-dependent statistical complexity di¬ 
verges as: 

id[A/'o|Ado = i] ~ log 2 1/At -I- C'^(At) . 

Finally, in continuous-time Ado and Adi limit to the same 


lim E(At) = dffAdol 
At-s-O 


lim /[AA/;A/,-A/^|Alo] 
At-iO 


Note that = limAt-j-o lWo,-^o — A//|Ado = i]- 
Bringing these results together, we substitute the 


above components into Eq. (A2 )’s expression for and, 


after details not shown here, find the expression quoted 
in the main text as Eq. (16). Similarly, for and E, 


yielding the the formulae presented in the main text in 
Eqs. (15) and respectively. 

As a last task, as our hypothetical null model, we wish 
to find the information measures for the corresponding 
renewal process approximation. The ISI distribution of 
the alternating renewal process is: 


(/(t) = 






( 1 ) 




( 2 ) 


and its survival function is: 

^(2)$(l)(t)+^(l)$(2)(i) 


$(t) = 


/rH) -I- ^( 2 ) 


Hence, its mean firing rate is: 


M = 




^(2)/^(l) +^(i)/^(2) ■ 


(A3) 


(A4) 


(A5) 


From Sec. m the entropy rate of the corresponding re¬ 
newal process is: 

/iy“(At) 1 

—At-^ + ; 


compare Eq. (15). And, the statistical complexity of the 


corresponding renewal process is: 


/^ren 

'-'M 


(At) - log 2 ^ -k i?[/r$(t)] . 


The rate of divergence of (^^"(At) is half the rate of di¬ 


vergence of the true C^(At), as given in Eq. (16). Trivial 


manipulations, starting from 0 < ~ ’ ii^ply 

that the rate of entropy-rate divergence is always less 
than or equal to the mean firing rate for an alternating 
renewal process. Jensen’s inequality implies that each of 
the nondivergent components of these information mea¬ 
sures for the renewal process is less than or equal to that 
of the alternating renewal process. The Data Processing 
Inequality also implies that the excess entropy cal¬ 
culated by assuming a renewal process is a lower bound 
on the true process’ excess entropy. 
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Appendix B: Simplicity in Complex Neurons 


Recall that our white noise-driven linear leaky 
integrate-and-lire (LIF) neuron has governing equation: 

V = b —V + af}{t) , (Bl) 


and, when V = 1, a spike is emitted and V is instanta¬ 
neously reset to 0. We computed ISI survival functions 
from empirical histograms of 10^ ISIs. These ISIs were 
obtained by simulating Eq. (Bl) in Python/NumPy us¬ 
ing an Euler integrator with time discretization of 1/1000 
of log&/(6 — 1), which is the ISI in the noiseless limit. 

The white noise-driven quadratic integrate-and-fire 
(QIF) neuron has governing equation: 


V = b + + ar]{t) , 


(B2) 


and, when V = 100, a spike is emitted and V is in¬ 
stantaneously reset to —100. We computed ISI survival 
functions also from empirical histograms of trajectories 
with 10^ ISIs. These ISIs were obtained by simulating 


Eq. (B2| in Python/NumPy using an Euler stochastic 


integrator with time discretization of I/IOOO of ^Jirjb, 
which is the ISI in the noiseless limit when threshold and 
reset voltages are -l-oo and —oo, respectively. 

Figure 1^ shows estimates of the following continuous¬ 
time information measures from this simulated data as 
they vary with mean firing rate /i and ISI coefficient of 
variation Cy- This required us to estimate /r, Cy, and: 


:= lim C^{At) + log 2 At 


:= lim E(At) , 

At->0 

j^CT mm 

^ At-J-O At 
6^ (At) 


-I- /r log 2 At , and 


b^'^ := lim 
^ At-s-O 


At 


where the superscript CT is a reminder that these are 
appropriately regularized information measures in the 
continuous-time limit. 

We estimated p, and Cy using the sample mean and 
sample coefficient of variation with sufficient samples so 
that error bars (based on studying errors as a function 
of data size) were negligible. The information measures 
required new estimators, however. From the formulae in 


Sec. Ill we see that: 


1 

= I 0 S 2 - d ^(.t) log2 ^{t)dt , 

M Jo 

(B3) 

E*^^ = f ^itcl){t)\og2{p.(j){t))dt 


(•oo 

-2 ^$(f)log2$(f)df , 

^0 

(B4) 

nOO 

= -M / </(i) log 2 </(i) , and 

Jo 

(B5) 

. pOO pOO 

M J + t')dt'dt 

+ log2 (/)(f)log2/>(f)df) . 

(B6) 


It is well known that the sample mean is a consistent es¬ 
timator of the true mean, that the empirical cumulative 
density function is a consistent estimator of the true cu¬ 
mulative density function almost everywhere, and thus 
that the empirical ISI distribution is a consistent estima¬ 
tor of the true cumulative density function almost every¬ 
where. In estimating the empirical cumulative density 
function, we introduced a cubic spine interpolator. This 
is still a consistent estimator as long as $(t) is three-times 
differentiable, which is the case for ISI distributions from 
integrate-and-fire neurons. We then have estimators of 
E*"^, , and b^'^ that are based on consistent 

estimators of /r, $(f), and (j){t) and that are likewise con¬ 
sistent. 

We now discuss the finding evident in Fig. that the 
quantities limAt-^o E(Af) and limAt->.o C'm + ^og 2 {fJ,At) 
depend only on the ISI coefficient of variation Cy and not 
the mean firing rate /r. Presented in a different way, this 
is not so surprising. First, we use Ref. |49j’s expression 
for Cfj, to rewrite: 


Qi = ^im^ {C^{At) + log2(AtAt)) 


= -M / 
Jo 

and Eq. ®> to rewrite: 


$(t) log2$(<)df 


Q 2 


lim E(At) 

At-).0 


2Qi -l- 



fit(j){t)\0g2{n4’{i))dt ■ 


So, we only need to show that —^ ^{t) log 2 ^{t)dt and 
Jp ^t(j){t)\og 2 (^ 4 >(t))dt are independent of /r for two- 
parameter families of ISI distributions. 

Consider a change of variables from ttot' = then: 

nOO 

Qi = - $ (t'/fi) log 2 ($ {t'/^i) )dt' (B7) 

^0 
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and 

pOO 

Q 2 = 2Qi + / t'(j) (t'/n) log2 {(I) it'/n))dt' . (B8) 

For all of the ISI distributions considered here, 
is still part of the same two-parameter family as 4>(t), 
except that its mean firing rate is 1 rather than /r. 
Its Cv is unchanged. Hence, Qi and Q 2 are the 
same for a renewal process with mean firing rate 1 
and /r, as long as the Cy is held constant. It fol¬ 
lows that limAt-fO E(At) and limAt->.o + ^og 2 {fJ,At) 
are independent of /i and only depend on Cy for the 
two-parameter families of ISI distributions considered 
in Sec. Similar arguments apply to understanding 
the universal Cy-dependence of limAt-s-o bfj, (At)//rAt and 
limAt-s-o + log^ifiAt). 

In Fig. 1^ we also see that E seems to diverge as 
Cy —t 0. Consider the following plausibility argument 
that suggests it diverges as log 2 l/Cy as Cy —t 0. These 
two-parameter ISI distributions with finite mean firing 
rate p, and small Cy <C 1 can be approximated as Gaus- 
sians with mean 1/fj, and standard deviation Cy/fi. Re¬ 
call from Eq. Q that we have: 


Note that as Cy —>■ 0: 

i t=l (B9) 

and so: 

pOO 

lim / $(t) log 2 = 0 . 

Cv^O Jq 

We assumed that for small Cy, we can approximate: 






2Cl ) 


which then implies that: 


noo //\/27r 1 

n / log 2 4>{t)dt « log 2 ^ --. (BIO) 

Jo '^v ^ 

So, for any ISI distribution tightly distributed about its 

mean ISI, we expect: 


E « logs ^ 


E = -2/ fi<^{t)\og2{n<^{t))dt 

Jo 


log 2 {n(l){t))dt 


ptX) 

- log 2 A* - / ^{t) logs <^{t)dt 

Jo 

poo 

+ M log2 'p{t)dt . 

Jo 


so that E diverges in this way. A similar asymptotic 
analysis also shows that as Cy —>■ 0, 


lim ^ (—_ - 

At-j-o At log 2 ' 2Cy 2 


(Bll) 


thereby explaining the divergence of limAt-j-o b^(At)/At 
evident in Fig. 

Finally, a straightforward argument shows that C^ is 
minimized at hxed /r when the neural spike train is peri¬ 
odic. We can rewrite C^ in the infinitesimal time resolu¬ 
tion limit as: 


C^{At) 




$(t) log 2 


1 

W) 


dt . 


Note that 0 < ^{t) < 1, and so <i>(t) log 2 :^^dt > 0. 
We set it equal to zero by using the step function given 
in Eq. (B9), which corresponds to a noiseless periodic 
process. So, the lower bound on C^(At) is log 2 1/fiAt, 
and this bound is achieved by a periodic process. 
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